Overview
Brought to you by YData
Dataset statistics
| Number of variables | 6 |
|---|---|
| Number of observations | 21013536 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 961.9 MiB |
| Average record size in memory | 48.0 B |
Variable types
| Numeric | 4 |
|---|---|
| Unsupported | 1 |
| DateTime | 1 |
Reproduction
| Analysis started | 2025-05-09 11:43:05.558394 |
|---|---|
| Analysis finished | 2025-05-09 11:51:00.703021 |
| Duration | 7 minutes and 55.14 seconds |
| Software version | ydata-profiling vv4.16.1 |
| Download configuration | config.json |
Variables
RatingID
Real number (ℝ)
Uniform  Unique 
| Distinct | 21013536 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10506768 |
| Minimum | 1 |
|---|---|
| Maximum | 21013536 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.3 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1050677.8 |
| Q1 | 5253384.8 |
| median | 10506768 |
| Q3 | 15760152 |
| 95-th percentile | 19962859 |
| Maximum | 21013536 |
| Range | 21013535 |
| Interquartile range (IQR) | 10506768 |
Descriptive statistics
| Standard deviation | 6066085.5 |
|---|---|
| Coefficient of variation (CV) | 0.57735026 |
| Kurtosis | -1.2 |
| Mean | 10506768 |
| Median Absolute Deviation (MAD) | 5253384 |
| Skewness | -2.1117456 × 10-16 |
| Sum | 2.2078436 × 1014 |
| Variance | 3.6797393 × 1013 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 21013520 | 1 | < 0.1% |
| 21013519 | 1 | < 0.1% |
| 21013518 | 1 | < 0.1% |
| 21013517 | 1 | < 0.1% |
| 21013516 | 1 | < 0.1% |
| 21013515 | 1 | < 0.1% |
| 21013514 | 1 | < 0.1% |
| 21013513 | 1 | < 0.1% |
| 21013512 | 1 | < 0.1% |
| 21013511 | 1 | < 0.1% |
| Other values (21013526) | 21013526 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 |
| Value | Count | Frequency (%) |
| 21013536 | 1 | |
| 21013535 | 1 | |
| 21013534 | 1 | |
| 21013533 | 1 | |
| 21013532 | 1 | |
| 21013531 | 1 | |
| 21013530 | 1 | |
| 21013529 | 1 | |
| 21013528 | 1 | |
| 21013527 | 1 |
UserID
Real number (ℝ)
| Distinct | 1056079 |
|---|---|
| Distinct (%) | 5.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1433100.8 |
| Minimum | 1000001 |
|---|---|
| Maximum | 2063390 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.3 MiB |
Quantile statistics
| Minimum | 1000001 |
|---|---|
| 5-th percentile | 1030895 |
| Q1 | 1180506 |
| median | 1357576 |
| Q3 | 1692543 |
| 95-th percentile | 1988865 |
| Maximum | 2063390 |
| Range | 1063389 |
| Interquartile range (IQR) | 512037 |
Descriptive statistics
| Standard deviation | 308756.57 |
|---|---|
| Coefficient of variation (CV) | 0.21544651 |
| Kurtosis | -1.0436298 |
| Mean | 1433100.8 |
| Median Absolute Deviation (MAD) | 227567 |
| Skewness | 0.47199114 |
| Sum | 3.0114516 × 1013 |
| Variance | 9.5330619 × 1010 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1084433 | 2986 | < 0.1% |
| 1034989 | 2979 | < 0.1% |
| 1070878 | 2613 | < 0.1% |
| 1048267 | 2597 | < 0.1% |
| 1160536 | 2392 | < 0.1% |
| 1000272 | 2309 | < 0.1% |
| 1006657 | 2307 | < 0.1% |
| 1019610 | 2291 | < 0.1% |
| 1107271 | 2225 | < 0.1% |
| 1035125 | 2101 | < 0.1% |
| Other values (1056069) | 20988736 |
| Value | Count | Frequency (%) |
| 1000001 | 97 | < 0.1% |
| 1000002 | 29 | < 0.1% |
| 1000003 | 22 | < 0.1% |
| 1000004 | 759 | |
| 1000005 | 108 | < 0.1% |
| 1000006 | 11 | < 0.1% |
| 1000007 | 9 | < 0.1% |
| 1000008 | 150 | < 0.1% |
| 1000009 | 46 | < 0.1% |
| 1000010 | 436 |
| Value | Count | Frequency (%) |
| 2063390 | 9 | < 0.1% |
| 2063389 | 37 | |
| 2063388 | 12 | < 0.1% |
| 2063387 | 5 | < 0.1% |
| 2063386 | 7 | < 0.1% |
| 2063385 | 10 | < 0.1% |
| 2063384 | 6 | < 0.1% |
| 2063383 | 53 | |
| 2063382 | 8 | < 0.1% |
| 2063381 | 5 | < 0.1% |
WineID
Real number (ℝ)
| Distinct | 100646 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 147287.3 |
| Minimum | 100001 |
|---|---|
| Maximum | 200795 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.3 MiB |
Quantile statistics
| Minimum | 100001 |
|---|---|
| 5-th percentile | 102025 |
| Q1 | 118159 |
| median | 155313 |
| Q3 | 168895 |
| 95-th percentile | 186393 |
| Maximum | 200795 |
| Range | 100794 |
| Interquartile range (IQR) | 50736 |
Descriptive statistics
| Standard deviation | 27655.253 |
|---|---|
| Coefficient of variation (CV) | 0.187764 |
| Kurtosis | -1.2073695 |
| Mean | 147287.3 |
| Median Absolute Deviation (MAD) | 19419 |
| Skewness | -0.16127642 |
| Sum | 3.095027 × 1012 |
| Variance | 7.6481303 × 108 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 155289 | 27415 | 0.1% |
| 179010 | 23626 | 0.1% |
| 179011 | 21216 | 0.1% |
| 111391 | 20913 | 0.1% |
| 167418 | 20817 | 0.1% |
| 162494 | 20456 | 0.1% |
| 167419 | 18823 | 0.1% |
| 135825 | 18748 | 0.1% |
| 179012 | 18575 | 0.1% |
| 167420 | 17759 | 0.1% |
| Other values (100636) | 20805188 |
| Value | Count | Frequency (%) |
| 100001 | 2625 | |
| 100002 | 10 | < 0.1% |
| 100003 | 62 | < 0.1% |
| 100004 | 110 | < 0.1% |
| 100005 | 72 | < 0.1% |
| 100006 | 1837 | |
| 100007 | 43 | < 0.1% |
| 100008 | 424 | < 0.1% |
| 100009 | 1971 | |
| 100010 | 1504 |
| Value | Count | Frequency (%) |
| 200795 | 5 | |
| 200794 | 5 | |
| 200793 | 5 | |
| 200792 | 5 | |
| 200791 | 5 | |
| 200790 | 5 | |
| 200789 | 5 | |
| 200788 | 5 | |
| 200787 | 5 | |
| 200786 | 5 |
Vintage
Unsupported
Rejected  Unsupported 
| Missing | 0 |
|---|---|
| Missing (%) | 0.0% |
| Memory size | 160.3 MiB |
Rating
Real number (ℝ)
| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.8848798 |
| Minimum | 1 |
|---|---|
| Maximum | 5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 160.3 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2.5 |
| Q1 | 3.5 |
| median | 4 |
| Q3 | 4.5 |
| 95-th percentile | 5 |
| Maximum | 5 |
| Range | 4 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.73757873 |
|---|---|
| Coefficient of variation (CV) | 0.18985883 |
| Kurtosis | 1.2489558 |
| Mean | 3.8848798 |
| Median Absolute Deviation (MAD) | 0.5 |
| Skewness | -0.71678179 |
| Sum | 81635060 |
| Variance | 0.54402238 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=9)
| Value | Count | Frequency (%) |
| 4 | 8301655 | |
| 3.5 | 3389567 | |
| 5 | 2950264 | 14.0% |
| 3 | 2755661 | 13.1% |
| 4.5 | 2505818 | 11.9% |
| 2.5 | 468045 | 2.2% |
| 2 | 425593 | 2.0% |
| 1 | 152452 | 0.7% |
| 1.5 | 64481 | 0.3% |
| Value | Count | Frequency (%) |
| 1 | 152452 | 0.7% |
| 1.5 | 64481 | 0.3% |
| 2 | 425593 | 2.0% |
| 2.5 | 468045 | 2.2% |
| 3 | 2755661 | 13.1% |
| 3.5 | 3389567 | |
| 4 | 8301655 | |
| 4.5 | 2505818 | 11.9% |
| 5 | 2950264 | 14.0% |
| Value | Count | Frequency (%) |
| 5 | 2950264 | 14.0% |
| 4.5 | 2505818 | 11.9% |
| 4 | 8301655 | |
| 3.5 | 3389567 | |
| 3 | 2755661 | 13.1% |
| 2.5 | 468045 | 2.2% |
| 2 | 425593 | 2.0% |
| 1.5 | 64481 | 0.3% |
| 1 | 152452 | 0.7% |
Date
Date
| Distinct | 19746027 |
|---|---|
| Distinct (%) | 94.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 160.3 MiB |
| Minimum | 2012-01-03 08:20:53 |
|---|---|
| Maximum | 2021-12-31 23:59:56 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
Histogram with fixed size bins (bins=50)
Interactions
Correlations
| Rating | RatingID | UserID | WineID | |
|---|---|---|---|---|
| Rating | 1.000 | -0.076 | -0.128 | -0.017 |
| RatingID | -0.076 | 1.000 | -0.039 | 0.032 |
| UserID | -0.128 | -0.039 | 1.000 | 0.029 |
| WineID | -0.017 | 0.032 | 0.029 | 1.000 |
Missing values
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Sample
| RatingID | UserID | WineID | Vintage | Rating | Date | |
|---|---|---|---|---|---|---|
| 0 | 1 | 1604441 | 136103 | 1950 | 4.0 | 2019-10-14 11:20:52 |
| 1 | 2 | 1291483 | 136103 | 1950 | 5.0 | 2019-11-28 03:36:33 |
| 2 | 3 | 1070605 | 104036 | 1950 | 5.0 | 2017-12-28 10:15:55 |
| 3 | 4 | 1080181 | 144864 | 1950 | 5.0 | 2016-06-23 02:16:22 |
| 4 | 5 | 1834379 | 111430 | 1950 | 5.0 | 2021-05-16 17:58:14 |
| 5 | 6 | 1995440 | 157985 | 1950 | 4.0 | 2016-01-06 22:14:14 |
| 6 | 7 | 1166181 | 101794 | 1950 | 5.0 | 2018-04-15 12:04:46 |
| 7 | 8 | 1839846 | 136103 | 1950 | 5.0 | 2020-07-18 15:41:19 |
| 8 | 9 | 1693747 | 136103 | 1950 | 1.0 | 2018-11-23 01:48:57 |
| 9 | 10 | 1478537 | 135897 | 1950 | 4.0 | 2015-05-04 19:52:09 |
| RatingID | UserID | WineID | Vintage | Rating | Date | |
|---|---|---|---|---|---|---|
| 21013526 | 21013527 | 1106198 | 120201 | 0 | 4.0 | 2021-07-10 07:02:24 |
| 21013527 | 21013528 | 1274281 | 112475 | 0 | 4.5 | 2019-04-14 17:36:28 |
| 21013528 | 21013529 | 1175074 | 112028 | 0 | 4.0 | 2020-02-22 08:31:35 |
| 21013529 | 21013530 | 1226184 | 174632 | 0 | 3.5 | 2019-10-07 00:10:55 |
| 21013530 | 21013531 | 1096101 | 157677 | 0 | 2.5 | 2017-04-25 00:48:56 |
| 21013531 | 21013532 | 2015383 | 113302 | 0 | 3.0 | 2019-02-16 14:15:48 |
| 21013532 | 21013533 | 1868739 | 111440 | 0 | 2.0 | 2018-09-30 16:47:05 |
| 21013533 | 21013534 | 1402947 | 142467 | 0 | 3.0 | 2021-01-29 19:21:14 |
| 21013534 | 21013535 | 1360350 | 111440 | 0 | 4.0 | 2021-07-26 14:02:14 |
| 21013535 | 21013536 | 1192603 | 111393 | 0 | 5.0 | 2016-11-17 04:48:43 |